Mapping of Sequence Reads to the Reference Genomes ◾ 73
So far, we have two directories: “data” where the FASTQ files were downloaded and
“refgenome” where the FASTA sequence of the reference genome was downloaded and
indexed.
In the next step, we will use any of the alignment algorithms to map the reads to the
human reference genome. The “bwa mem” algorithm can be tried first as recommended
by the BWA.
1-BWA-MEM algorithm
In the following, the “bwa mem” command maps the reads in the FASTQ files to the
human reference genome and saves the output file, which contains the mapped reads in a
new directory called “bwa_sam”. Before running the commands, make sure that the work-
ing directory is just one level out of the two directories.
mkdir sam
bwa mem \
-t 4 \
refgenome/GRCh38.p13_ref.fna \
data/SRR769545_1.fastq.gz \
data/SRR769545_2.fastq.gz \
> sam/SRR769545_mem.sam 2> sam/SRR769545_mem.log
The “bwa mem” command has the capability of mapping reads of a length ranging between
70 bp and 1 Mbp. The BWA-MEM algorithm uses seeding alignments with maximal exact
matches (MEMs) and then it extends seeds with the affine-gap SW algorithm. We can run
“bwa mem” on the command line without any option to learn more about “bwa mem”
command. In the above, we use “-t 4” so that the program can use four parallel threads to
perform read mapping. Then, we provided the path of the FASTA file name of the reference
genome and the two FASTQ file names. The output of the aligned reads will be redirected
to a new file “SRR769545_mem.sam” using the Unix/Linux redirection symbol “>”. The
command execution comments are redirected to the log file “SRR769545_mem.log” using
“2>”, which is used in Unix/Linux to redirect stderr (standard error) to a text file. Both the
output files will be saved in the “sam” directory. The log file contains important informa-
tion about the comments and any standard error that occurs during the mapping process.
Always open the log file if the running fails. The log file will tell you if the failure is due to
the wrong syntax, wrong file path, or any other standard error. Even if the program has
finished running normally, it is a good practice to open the log file and look at the statistics
and comments.
cd sam
less SRR769545_mem.log
You can also display the SAM file produced by “bwa mem” with the following:
less -S SRR769545_mem.sam